-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent StepExecutionIterator
from leaking memory in cases where a single processed execution has a stuck CPS VM thread
#347
Conversation
...Test/stepExecutionIteratorDoesNotLeakBuildsWhenOneIsStuck/jobs/test0/builds/1/workflow/7.xml
Outdated
Show resolved
Hide resolved
...nListTest/stepExecutionIteratorDoesNotLeakBuildsWhenOneIsStuck/jobs/test0/builds/1/build.xml
Outdated
Show resolved
Hide resolved
…<List<Void>> instead of Future<List<StepExecution>>
StepExecutionIteratorImpl$applyAll
to Future<List<Void>>
instead of Future<List<StepExecution>>
`
StepExecutionIteratorImpl$applyAll
to Future<List<Void>>
instead of Future<List<StepExecution>>
`StepExecutionIteratorImpl$applyAll
to Future<List<Void>>
instead of Future<List<StepExecution>>
StepExecutionIteratorImpl$applyAll
to Future<List<Void>>
instead of Future<List<StepExecution>>
StepExecutionIteratorImpl.applyAll
to Future<List<Void>>
instead of Future<List<StepExecution>>
src/test/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionListTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionListTest.java
Outdated
Show resolved
Hide resolved
…R) and simplify memory leak Pipeline
src/test/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionListTest.java
Outdated
Show resolved
Hide resolved
StepExecutionIteratorImpl.applyAll
to Future<List<Void>>
instead of Future<List<StepExecution>>
StepExecutionIterator
from leaking memory in cases where a single processed build has a stuck CPS VM thread
StepExecutionIterator
from leaking memory in cases where a single processed build has a stuck CPS VM threadStepExecutionIterator
from leaking memory in cases where a single processed execution has a stuck CPS VM thread
notStuckBuild = null; // Clear out the local variable in this thread. | ||
Jenkins.get().getQueue().clearLeftItems(); // Otherwise we'd have to wait 5 minutes for the cache to be cleared. | ||
// Make sure that the reference can be GC'd. | ||
MemoryAssert.assertGC(notStuckBuildRef, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert the fix and this will fail. The PR description shows the reference path preventing the build from being cleaned up.
} | ||
return null; | ||
}, MoreExecutors.directExecutor()); | ||
ListenableFuture<Void> resultsWithWarningsLogged = Futures.catching(results, Throwable.class, t -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes. Project Loom would be very welcome in code like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking more carefully through Guava's Javadoc I think I can use FluentFuture
to simplify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually FluentFuture
doesn't really make things any clearer, so I will leave it.
src/test/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionListTest.java
Outdated
Show resolved
Hide resolved
var stuck = r.createProject(WorkflowJob.class, "stuck"); | ||
stuck.setDefinition(new CpsFlowDefinition("blockSynchronously 'stuck'", false)); | ||
var stuckBuild = stuck.scheduleBuild2(0).waitForStart(); | ||
await().atMost(5, TimeUnit.SECONDS).until(() -> SynchronousBlockingStep.isStarted("stuck")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or waitForMessage
would probably suffice.
await().atMost(5, TimeUnit.SECONDS).until(() -> SynchronousBlockingStep.isStarted("stuck")); | ||
// Make FlowExecutionList$StepExecutionIteratorImpl.applyAll submit a task to the CpsVmExecutorService | ||
// for stuck #1 that will never complete, so the resulting future will never complete. | ||
StepExecution.applyAll(e -> null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return value be kept in a local variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary for the leak, if that's what you mean. Even if the StepExecution.applyAll
caller completely ignores the return value, the stuck CPS VM thread holds a strong reference to the aggregate future because the SingleLaneExecutorService
's task queue has a reference to the SettableFuture
for the getCurrentExecutions
call, and then that future references the aggregate future via a listener just due to the implementation of Futures.allAsList
.
…st method Co-authored-by: Jesse Glick <[email protected]>
I recently saw a real world case (ref. SECO-3962) where a Pipeline build’s flow graph had a loop due to some type of corruption (I suspect a crash while the Pipeline was running lead to a
build.xml
whoseiota
andhead
were older than the most recentFlowNode
inworkflow/
and itsStepExecution
inprogram.dat
), which caused itsCpsVmExecutorService
to get stuck in an infinite loop on a single task. This caused all calls toStepExecution.applyAll
to leak all liveWorkflowRun
s at the time of the call becauseStepExecution
s for those builds were strongly reachable from the CPS VM thread for the stuck build via theSingleLaneExecutorService
task queue for theCpsVmExecutorService
and the internals of Guava’sFutures.allAsList
.Here is the leak path (produced via
MemoryAssert
)There are a lot of things that could be improved here, but I am not sure where our effort is best spent given that I have not previously seen corruption cause an infinite loop like this:
StepExecutionIteratorImpl.applyAll
toListenableFuture<List<Void>>
instead ofListenableFuture<List<StepExecution>>
. The result type in the API isListenableFuture<?>
, and the return value is only intended to be used to track progress and cancel the operation if needed, so this change should be compatible. This change means the combined future drops its references to each build'sStepExecution
s as soon as they have been processed, rather than having to hold a reference to them until all build'sStepExecution
s have been processed.LinearBlockHoppingScanner
,LinearScanner
,DepthFirstScanner
,ForkScanner
,StandardGraphLookupView.bruteForScanForEnd
, etc., and throw an exception whenever any infinite loop is encountered.CpsVmExecutorService
to warn when a build’sSingleLaneExecutorService
seems to be having problems, e.g. the queue size is only increasing, the executing task has been live for more than 15 minutes, tasks are starting more than 15 minutes after they were submitted, etc.metrics
, etc.Testing done
Submitter checklist